Apparatus and method for detecting a moving object in a sequence of color frame images

ABSTRACT

An apparatus and method for detecting a moving object in a sequence of color frame images is provided. The apparatus comprises a color normalizer for normalizing color components of the color frame image to produce a normalized frame image; a color transformer coupled to the color normalizer for color transforming the normalized frame image to a color transformed frame image; the color transformed frame image having intensity levels such that colors corresponding to said moving object are emphasized; a frame delay coupled to the color transformer for delaying the color transformed frame image by one frame; and a motion detector coupled to the color transformer and the frame delay for detecting the motion of the moving object and further intensifying the intensity levels of the color transformed frame image based on the detected motion.

TECHNICAL FIELD

The present invention relates to an apparatus and method for detecting amoving object in a motion picture, and more specifically, to anapparatus and method for detecting a face region from a motion pictureby using the color and motion information of the face region.

BACKGROUND ART

Detecting a face region from a motion picture is one of the prerequisitesteps in face recognition. Conventional schemes for detecting a faceregion have not been used widely, since those schemes are affected bybackground images, and the size and orientation of the face.

Detection of face regions may be performed by using the information ofthe shape, color, or motion of the faces.

Using shape information for detecting a face region may be performed bymeasuring variation of gray levels and app lying the measured values toa priori information of the face. However, the scheme should be appliedonly to images containing front side of a face, and the detection resultis largely affected by background images, and the size and orientationof the face.

The scheme of using color information suffers from racial deviations offace colors, because the scheme detects a face region by using theinherent color of human faces. Further, the scheme requires large amountof data processing, since it uses much more information than the schemeof using shape information. However, it becomes more applicable than thescheme of using shape information, as hardware technology has recentlydeveloped.

The motion information may also be used for detecting face regions inmotion pictures wherein an object, i.e. a face, is moving.

DISCLOSURE OF INVENTION

It is therefore a principal object of the invention to provide anapparatus and method for detecting a moving object wherein the detectionis not affected by background images, and the size and orientation ofthe moving object.

It is another object of the invention to provide an apparatus and methodfor detecting a moving object including a face region wherein thedetection may be performed faster than the prior art.

In accordance with one aspect of the present invention to achieve theaforementioned object, an apparatus for detecting a moving object in asequence of color frame images is provided. The color frame images havea plurality of pixels, each having three color components. The apparatuscomprises a color normalizer for normalizing color components of thecolor frame image to produce a normalized frame image, a colortransformer coupled to the color normalizer for color transforming thenormalized frame image to a first color transformed frame image, thefirst color transformed frame image having intensity levels such thatpixels corresponding to said moving object are emphasized, a frame delaycoupled to the color transformer for delaying the first colortransformed frame image by one frame to produce a second colortransformed frame image, and a motion detector coupled to the colortransformer and the frame delay for detecting the motion of the movingobject and further intensifying the intensity levels of the first colortransformed frame image based on the detected motion.

BRIEF DESCRIPTION OF DRAWINGS

The present invention will become more apparent upon a detaileddescription of the best mode for carrying out the invention as renderedbelow. In the description to follow, references will be made to theaccompanying drawings, where like reference numerals are used toidentify like parts in the various views and in which:

FIG. 1 shows a block diagram of the overall system for detecting amoving object in a color motion picture in accordance with the presentinvention;

FIG. 2 shows a block diagram of the color normalizer 100 of FIG. 1;

FIG. 3 a shows a histogram of normalized color components in a typicalface image.

FIG. 3 b shows a histogram of normalized color components in a faceimages modeled by using 2-dimensional Gaussian distribution function.

FIG. 4 shows a block diagram of the color transformer 200 of FIG. 1;

FIG. 5 shows results of color transformation by using GFCD modeling of atypical face image;

FIG. 6 shows a block diagram of the motion detection block 400 of FIG.1.

FIG. 7 shows motion detection results of two successive images whichhave a rectangular object.

FIG. 8 shows motion detection results of two successive images having ahuman face.

BEST MODE FOR CARRYING OUT THE INVENTION

FIG. 1 shows a block diagram for illustrating the overall system fordetecting a moving object in a color motion picture in accordance withthe present invention. The system comprises a color normalizer 100 fornormalizing color components of a current frame image of the colormotion picture; a color transformer 200 for color transforming thenormalized frame image to a color transformed frame image; a frame delay300 for temporarily storing the color transformed frame image from thecolor transformer 200 to delay the color transformed frame image by oneframe; and a motion detector 400 for receiving the color transformedcurrent and previous frame images from the color transformer 200 and theframe delay 300 and detecting the motion of the moving object. The colortransformed frame image, which is produced by the color transformer, hasintensity levels such that pixels corresponding to the moving object areemphasized. The motion detector 400 uses the detected motion of themoving objects to further intensify the intensity levels of pixels ofthe moving object.

Referring now to FIG. 2, the operation of the color normalizer 100 ofFIG. 1 will be explained.

In color model of RGB color space, for example, any color information Qmay be represented by three color components, i.e., red, green, and bluecolor components R, G, and B, having different wavelengths. In otherwords, the color information Q may be given by Q=(R, G, B). A luminancevalue Y may be defined by sum of the three color components. Theluminance value Y is a measure of a visible radiant energy whichproduces the sensation of brightness. In order to prevent brightness ofan image from affecting the color components of an image, the colorcomponents of each pixel must be normalized by luminance value Y of thepixel.

FIG. 2 shows a block diagram of the color normalizer 100. The colornormalizer 100 comprises a luminance extractor 130, and two normalizers110 and 120. The luminance extractor 130 evaluates a luminance value Yby summing red, green, and blue color components R, G, and B for eachpixel of a frame image, i.e., Y=R+G+B, and outputs it. Then, thenormalizer 110, for example, red component normalizer receives the Yvalues and outputs normalized color component r by using the luminancevalue Y and red color component R, wherein r is given by (R/Y)*255.Similarly, the normalizer 120, for example, green component normalizerreceives the luminance value Y and outputs normalized color component gwhich is given by g=(R/Y)* 255. Hereinabove, we assumed that theluminance value has depth of 3 bits, and the most bright pixel has theluminance value of 255. By the normalization, any color information Qmay be represented by two normalized color components r and g, i.e.Q=(r, g), since the three normalized color components r, g, and b hasthe relationship of r+g+b=255. In this way, the color model of thenormalized color space can represents 256²/2 colors, while the colormodel of RGB color space represents 256³ colors.

FIG. 3 a shows histogram of normalized color components in a typicalface image. In FIG. 3 a, two horizontal axis represents two normalizedcolor component values r and g, and the vertical axis represents numberof pixels having normalized color components of r and g in a typicalface image. Distributions of the normalized color components are similarfor all face images regardless of backgrounds and brightness.

FIG. 3 b shows histogram of normalized color components in face imagewhich is modeled by 2-dimensional Gaussian distribution function. Inthis embodiment of the invention, the modeling is performed by using2-dimensional Gaussian distribution as shown in FIG. 3 b. The Gaussianmodel for color distribution in face images is called Generalized FaceColor Distribution (GFCD). GFCD is defined by mean m and variance Σ², inother words, GFCD is represented by GF(m, Σ²). In this representation,m=(/r, /g) is the center point of the 2-dimensional Gaussiandistribution, wherein /r and /g are mean values of normalized colorcomponents for all pixels in a face image. Σ² represents a covariancematrix of normalized color components in a face image. In a preferredembodiment, a GFCD for face images may have m=(105, 95) and Σ² ofσ_(r)=20 and σ_(g)=15. By using the GFCD model, an input color imagecontaining a face may be transformed to a gray-level image where pixelshaving color of the face are emphasized.

The color transformation using a color distribution model transformsnormalized color components of pixels to values which represent theproximity of the color components to the mean m=(/r, /g) of the Gaussiandistribution in the normalized color space. The color transformation isdefined as follows: $\begin{matrix}\begin{matrix}{{f\text{:}R^{2}} - R^{1}} & \; \\{{Z\left( {x,y} \right)} = {{GF}\left( {{r\left( {x,y} \right)},{g\left( {x,y} \right)}} \right)}} & {\left( {x,y} \right) \in 1}\end{matrix} & {{Eq}.\mspace{14mu} 1}\end{matrix}$wherein (x, y) is a coordinate of a pixel in the face image, g(x, y) andr(x, y) are normalized color components of the pixel at the coordinate(x, y), and GF( ) is Generalized Face Color Distribution function. Thecolor transformation using GFCD produces values which are proportionalto the proximity of the color of the pixel to a human face color.

Referring to FIGS. 4 and 5, the operation of the color transformers 200of FIG. 1 will be explained.

FIG. 4 shows a block diagram of the color transformer 200. Colortransformer 200 comprises blocks 210 and 220, two normalization block230 and 240, and output block 250. Block 210 receives the normalizedcolor component r from the normalizer 110 and compares it with σ_(r)which is predetermined in accordance with a particular GFCD model. Ifthe normalized color component r is within three times of σ_(r) around/r, then, the block 210 outputs r′=|r−/r|. Otherwise, the block 210outputs a very large value in order to make Zr=0. Blocks 230 receivesthe output value r′ and outputs transformed value Zr by referring to alook-up table LUT which contains 1-dimensional normalized Gaussiannumber GF(0, σ_(r) ²) Similarly, block 220 receives the normalized colorcomponent g and compares it with σ_(g). If the normalized colorcomponent g is within three times of σ_(g) around /g, then, the block220 outputs g′=|g−/g|. Otherwise, the block 220 outputs a very largevalue in order to make Zg=0. The block 240 receives the output value g′and outputs transformed value Zg by referring to a look-up table LUTwhich contains 1-dimensional normalized Gaussian number GF(0, σ_(g) ²).Output block 250 receives Zr and Zg values and evaluates Z=(Zr*Zg)*255and outputs Z. FIG. 5 shows results of color transformation by usingGFCD modeling of a typical face image. As can be seen in the resultimage of FIG. 5, areas having human face color are emphasized.

If any background images have color which is similar to the color ofhuman face, the above method may emphasize those background images aswell as face area. In order to overcome this problem, the present methoduses motion information of human face. In general, only the human facearea is moving while background images remain still. Thus, the humanface area may be outstandingly distinguished from background images, ifmotion information of the face area is utilized.

Two methods for extracting motion information from two successive frameimages are well known in the art. They are region based method andfeature point based method. Most applications in the art use the regionbased method rather than the feature point based method, since thelatter generally requires post-processing which uses interpolationtechnique.

In a typical region based method, the inter-frame differences ofintensity levels of pixels adjacent a particular pixel are summed inorder to measure the motion of the particular pixel. This method isknown as Accumulated Difference Measure (ADM) method. Since the ADMmethod uses the summation of intensity differences of neighbor pixels,it detects a motion robustly against small noisy change of neighborpixels. However, pixels having small differences are ignored byaveraging effect, and the measurements are largely affected by thedetermination of threshold value.

Thus, the present invention preferably performs motion detection bycounting pixels whose inter-frame differences are larger than athreshold value in a window having predetermined size. This motiondetection measure will be called Unmatching Pixel Count (UPC) measureand may be written as follows: $\begin{matrix}{{{{UPC}\left( {x,y,t} \right)} = {\sum\limits_{i = {x - N}}^{i = {x + N}}{\sum\limits_{j = {y - N}}^{j = {y + N}}{U\left( {i,j,t} \right)}}}}{{U\left( {i,j,t} \right)} = \left\{ \begin{matrix}{1,} & {{{if}\mspace{14mu}{{{Z\left( {i,j,t} \right)} - {Z\left( {i,j,{t - 1}} \right)}}}} > {Th}} \\{0,} & {otherwise}\end{matrix} \right.}} & {{Eq}.\mspace{14mu} 2}\end{matrix}$wherein Th is a threshold value for determining whether the intensityvalues of pixels in two successive frame images are matched. In the UPCmethod, since matching is tested in a pixel by pixel manner and theresults are accumulated, the averaging effect is reduced such thatoverlapping area causing a small intensity change is clearly captured.

In order to detect motion of a color image, the color informationexplained hereinabove may be used. It may be recognized by those skilledin the art that the intensity level of each pixel in the colortransformed image, which was transformed by using the colortransformation method of GFCD modelling, represents a probability thatthe pixel is in a face area. The color information of the colortransformed image may be incorporated in Eq. 2 by weighting the countswith the intensity level of the pixel of the color transformed image.This new motion detection measure will be referred to as WeightedUnmatching Pixel Count (WUPC). The WUPC(x, y, t) is evaluated byperforming fuzzy-AND operation between the UPC measure UPC(x, y, t) andthe intensity level of the color transformed image Z(x, y, t). The WUPCmeasure is given by following equation: $\begin{matrix}{{{{WUPC}\left( {x,y,t} \right)} = {{Z\left( {x,y,t} \right)} \otimes {\sum\limits_{i = {x - N}}^{i = {x + N}}{\sum\limits_{j = {y - N}}^{j = {y + N}}{U\left( {i,j,t} \right)}}}}}{{U\left( {i,j,t} \right)} = \left\{ \begin{matrix}{1,} & {{{if}\mspace{14mu}{{{Z\left( {i,j,t} \right)} - {Z\left( {i,j,{t - 1}} \right)}}}} > {Th}} \\{0,} & {otherwise}\end{matrix} \right.}} & {{Eq}.\mspace{14mu} 3}\end{matrix}$wherein the operator {circle around (x)} is fuzzy-AND operator. ThisWUPC measure emphasizes motion of the pixels having color which issimilar to color of a human face, while deemphasizing motion of theother pixels, i.e., pixels of background images.

In Eq. 3, the threshold value Th may be obtained adaptively by using thetransformed image value Z(x, y, t) for each pixel. In other words, theTh becomes small for pixels which are in a face area and large for otherpixels. In this way, the WUPC measure is more sensitive to motion of pixels in a face area than motion of the other pixels, such that even smallmotion of face area may be detected. In a preferred embodiment of thepresent invention, the threshold Th may be obtained by using Sigmoidfunction as follows: Slope of Sigmoid function becomes steep as Q islowered. Thus, Sigmoid $\begin{matrix}{{{Th}(Z)} = \frac{255}{1 + {\mathbb{e}}^{\frac{{Z{({x,y,t})}} - {255/2}}{Q}}}} & {{Eq}.\mspace{14mu} 4}\end{matrix}$function is a lot like a step-function shape for small Q. It isappropriate to use Sigmoid function as a function for determiningthreshold Th of Eq. 3, since the Sigmoid function outputs smallthreshold Th for a large input value and large Th for small input value.Further, the Sigmoid function is nonlinear in that the function does notrespond abruptly to maximum or minimum input values. This method foradaptively counting unmatched pixels will be called Adaptive WeightedUnmatching Pixel Count (AWUPC) measure.

FIG. 6 shows a block diagram of the motion detection block 400 ofFIG. 1. The motion detection block 400 comprises line buffers 410 and420 for temporarily storing color transformed images and outputtingintensity levels of pixels in a window of a predetermined size; anadaptive threshold value generator 440 for generating threshold th inresponse to intensity level of a central pixel in the window; anunmatching pixel value generator 430 for receiving the threshold Th andpresent and previous frames of the color transformed image from linebuffers 410 and 420 to determine whether the pixels in the window of thepresent frame match with the pixels in the window of the previous frameby evaluating unmatching pixel value U(x, y, t) of Eq. 3; an unmatchingpixel counter 450 for counting the pixels which are determined to beunmatching pixels in the unmatching pixel generator 430; and a fuzzy-ANDoperator 460 for performing fuzzy-AND operation on the outputs of theunmatching pixel counter 450 with the intensity level Z(x, y, t) of thepixel of the color transformed image from the line buffer 420. The linebuffer 420 outputs intensity level Z(i, j, t) in a window of the presentframe, while the line buffer 410 outputs intensity level Z(i, j, t−1) ina window of the previous frame of the color transformed image. Thewindow size is (2N+1)×(2N+1) such that the line buffers 410 and 420output intensity levels of (2N+1)×(2N+1) pixels. The threshold valuegenerator 440 receives the intensity level of the central pixel Z(x, y,t) among the window of the present frame and evaluates a threshold valueTh according to Eq. 4. The unmatching pixel generator 430 receivesoutputs from the line buffers 410 and threshold value Th from thethreshold value generator 440. Then, the unmatching pixel generator 430evaluates the u(i, j, t) of Eq. 3 for a central pixel Z(x, y, t) of thewindow. The unmatching pixel counter 450 receives the outputs of theunmatching pixel generator 430 and counts the number of unmatchingpixels. The fuzzy-AND operator 460 outputs a color motion information byperforming a fuzzy-AND operation between the intensity level of thecentral pixel Z(x, y, t) from the line buffer 420 and the outputs fromthe unmatching pixel counter 450.

Referring to FIGS. 7 and 8, motion detection results of the presentinvention will be compared with those of the prior art. FIG. 7 showsmotion detection results of two successive images which have arectangular object. FIG. 7 a shows the first images having a rectangularobject with background images. Pixels in the rectangular object of FIG.7 a have random gray levels of 200–230, while the gray levels of pixelsin the background images are random values of 0–30. FIG. 7 b shows thesecond image wherein the rectangular object is moved by 50 pixels in xand y directions. FIG. 7 c shows the motion detection results accordingto the conventional ADM method. The areas, whose inter-frame differencesof intensity levels are large, are emphasized. However, those areaswhere the rectangular objects of the two subsequent images areoverlapped are not emphasized. Thus, the motion is not well detected bythe conventional ADM method when the two objects of sebsequent frmes areoverlapped. FIG. 7 d shows a motion detection result according to theUPC measure. In FIG. 7 d, those areas where the rectangular objects ofthe two subsequent images are overlapped are also emphasized. The UPCmeasure can detect motion of a moving object better than the ADM method,since it is able to detect overlapping areas of a moving object. FIG. 7e shows motion detection result according to the WUPC measure. The areasof the moving object having a desired color in the first image areemphasized. FIG. 7 f shows the motion detection results by using AWUPCmeasure according to the present invention. FIG. 8 shows motiondetection results of two successive images having a human face. FIG. 8 ashows GFCD color transformed image of the first images having a humanface with background images. FIG. 8 b shows GFCD color transformed imageof the second image wherein the human face is moved slightly. FIG. 8 cshows the motion detection results according to the conventional ADMmethod. The motion is not well detected by the conventional ADM methodas shown in FIG. 8 c. FIG. 8 d shows a motion detection result accordingto the UPC measure. FIGS. 8 e and 8 f respectively show motion detectionresult according to the WUPC measure and AWUPC measure in accordancewith the present invention.

1. An apparatus for detecting a moving object in a sequence of colorframe images, said color frame images having a plurality of pixels, eachpixel having three color components, comprising: a color normalizer fornormalizing color components of each color frame image to produce anormalized color frame image; a color transformer coupled to said colornormalizer for color transforming said normalized color frame image to afirst color transformed frame image, said first color transformed frameimage having intensity levels such that pixels corresponding to saidmoving object are emphasized; a frame delay coupled to said colortransformer for delaying said first color transformed frame image by oneframe, said delayed first color transformed frame image being a secondcolor transformed frame image; a motion detector coupled to said colortransformer and said frame delay for detecting the motion of the movingobject and further intensifying the intensity levels of said first colortransformed frame image based on the detected motion; and the intensitylevel of said each pixel of said first color transformed frame image andthe normalized color components of the pixel has a relationship asfollows:Z(x,y)=GF(r(x,y),g(x,y))(x,y)ε1 where (x,y) is a coordinate of saidpixel in said normalized frame image, Z(x,y) is the intensity level ofsaid pixel of said first color transformed frame image at the coordinate(x,y), r(x,y) and g(x,y) are normalized color components of said pixelat the coordinate (x,y), and GF( ) is a 2-dimensional Gaussiandistribution function.
 2. The apparatus of claim 1, wherein: said motiondetector comprises means for detecting the motion of each pixel bycounting pixels adjacent said each pixel whose intensity leveldifferences between said first and second color transformed frame imagesare larger than a threshold value; and said intensity level of eachpixel is further intensified by weighting said intensity level inaccordance with said detected motion of said each pixel.
 3. Theapparatus of claim 2, wherein said weighting is performed by fuzzy-ANDoperating said intensity level with said detected motion for said eachpixel.
 4. The apparatus of claim 2, wherein said threshold value isobtained by using a Sigmoid function as follows:${{Th}(Z)} = \frac{255}{1 + {\mathbb{e}}^{\frac{{z{({x,y,t})}} - {255/2}}{Q}}}$wherein Z(x,y,t) is the intensity level of a pixel and Q is apredetermined parameter.